A
data warehouse can be built from the top down or from the bottom up. To
build a top-down warehouse, you need to form a complete picture or
logical data model for the entire organization (or all the subsystems
within the scope of the project, such as all financial systems). In
contrast, building a warehouse from the bottom up takes a much more
departmental or specific business-area focus (for example, a sales
order system only). This breaks the task of modeling the data into more
manageable chunks. Such a departmental approach produces data marts
that are potentially subsets of the overall data warehouse. The
bottom-up approach can simplify implementation. It helps get
departmental or business-area information to the people who need it,
makes it easier to protect sensitive data, and results in better query
response times because data marts deal with less data than a voluminous
transactional system. The potential risk in the data mart approach is
that disparity in data mart implementation can result in a logically
disjointed enterprise data warehouse if efforts aren’t carefully
coordinated across the organization.
Before you embark on an OLAP
database creation effort, the time you spend understanding the
underlying requirements is the best time you can give your effort. If
scope is set correctly, you will be able to achieve an
industrial-strength OLAP design without much difficulty. First, you
need to take care of some groundwork:
1. | Carefully
assess the scope of what you want to represent in the BI environment.
Start small, as the bottom-up approach suggests. For instance, just
tackle the sales data facts.
| 2. | Coordinate
your efforts with other related BI efforts. Let people know that you
are carving out a specific subject area or departmental data and, when
you finish, publish your design to everyone.
| 3. | Seek
out any shared dimensions that might have already been created for
other cubes. You want to leverage these as much as possible for the
sake of data consistency and nonredundant processing.
| 4. | Understand
your data sources. The OLAP cube you create will be only as good as the
data you put into it. It’s best to understand the dirty data issues of
what you are about to touch long before you try to build an OLAP cube
with it.
|
An Analytics Mini-Methodology
To successfully build
OLAP solutions, you are advised to carefully assess the requirements of
your end users in as detailed fashion as is possible. A
mini-methodology that focuses on the essential usages and
characteristics of an Analytic solution can prove invaluable. The
following sections outline a solid approach to nailing down your BI
requirements and yielding optimal OLAP designs that solve your end
users’ needs.
Assumption: You are building a business area–focused OLAP cube.
Requirements Phase
1. | Identify
the processing requirements for this DSS. What analysis do you need to
do? Are trend reporting, forecasting, and so on necessary? These can
often be represented in use case form (via UML).
Ask each user what business decision questions he or she needs to have answered. Ask each user how often he or she needs these questions answered and exactly when the questions must be answered. Ask each user how current the data must be to get accurate answers. (This speaks to data latency.)
| 2. | Identify
the data needed to fulfill these requirements. What data must be
touched to provide answers? The best way to capture this type of
information is a logical data model. Even a rough model is better than
none at all. This is the point where you focus on the facts that need
to be analyzed.
| 3. | Identify
all possible hierarchies and level representations (that is,
aggregations). This is how the data is used. Most users are likely to
tell you that they want to see product data in the product hierarchy
structure that has already been set up (for example, product family,
product groups).
| 4. | Identify
the time hierarchies that the users need. Because time is usually
implicit, it just needs to be clarified in terms of levels of
aggregation (for example, years, quarters, months, weeks, days) and
whether it needs to be fiscal versus Gregorian calendar, both, or
something else.
| 5. | Understand the data that each user can view from a security point of view.
|
Design Phase
1. | Analyze
which data sources are needed to fulfill the requirements. See whether
dimensions or OLAP cubes that already exist can be shared.
| 2. | Understand
what data transformations need to be done to the source data to provide
it to the OLAP world. This might include pre-aggregation, reformatting,
data integrity verifications, and so on.
| 3. | Translate these requirements into an OLAP model design:
Translate
to MOLAP if your data sources are not going to be leveraged at all and
you will be taking full advantage of OLAP storage. Translate to ROLAP if you are going to leverage an existing relational design and storage. Translate
to HOLAP if you are going to partially utilize the source data storage
and partially utilize OLAP storage. This is the most frequently used
approach.
|
Construction Phase
1. | Implement data extraction, transformation, and loading (ETL) logic (via T-SQL, SSIS, or other methods).
| 2. | Create the data sources to be used.
| 3. | Create the dimensions.
| 4. | Create the cube.
| 5. | Select data measures (that is, the data facts) for the cube.
| 6. | Design the storage and aggregations.
| 7. | Process the cube. This brings the data into the OLAP environment.
| 8. | Verify data integrity.
|
Implementation Phase
1. | Define the security roles in the cube.
| 2. | Train the user to use the system.
| 3. | Process the data into the OLAP environment (from production data sources).
| 4. | Verify data integrity.
| 5. | Allow users to use the OLAP cube.
|
Maintenance Phase
1. | Evaluate access optimization in the OLAP cube via usage analysis.
| 2. | Do data mining discovery, if desired.
| 3. | Make schema changes/enhancements, as necessary.
|
|